Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: ipadreporter Date: Nov 1, 2013 8:23am
Forum: forums Subject: Can the IA Crawler be aimed at a specific Website?

Is there any way to get Internet Archive to archive an entire news website?
I used to run Arlingtonmercury.org. For many reasons, we cannot keep the site up (it's unaccessible right now). I can, however, open the site back up by paying the hosting company. If I did that, could I request that IA crawl all over it and take snapshots like crazy?

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffNemo_bis Date: Nov 6, 2013 1:35am
Forum: forums Subject: Re: Can the IA Crawler be aimed at a specific Website?

You can try and submit the URL when you bring the site up, you'll get it crawled instantly: http://blog.archive.org/2013/10/25/fixing-broken-links/
But probably the best approach is to crawl the website yourself and upload the archive here as an item: http://archiveteam.org/index.php?title=Wget_with_WARC_output

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 7, 2013 1:33am
Forum: forums Subject: Re: Can the IA Crawler be aimed at a specific Website?

Nemo_bis,
Is it possible to pass on a self-created archive (wget with warc) to the Wayback Machine? Once created, you could upload the archive to IA, and anyone interested can download it. But would it show up in the Wayback Machine? Its utility would be limited if it doesn’t.


This post was modified by aibek on 2013-11-07 09:33:00

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffNemo_bis Date: Nov 7, 2013 1:33am
Forum: forums Subject: Re: Can the IA Crawler be aimed at a specific Website?

Yes, your WARC will eventually enter the Wayback machine, though I don't know the details of the process. See https://archive.org/post/421779/submit-warc